Pesquisa | Portal Regional da BVS

Correction to: Recommendations for performance optimizations when using GATK3.8 and GATK4.

Heldenbrand, Jacob R; Baheti, Saurabh; Bockol, Matthew A; Drucker, Travis M; Hart, Steven N; Hudson, Matthew E; Iyer, Ravishankar K; Kalmbach, Michael T; Kendig, Katherine I; Klee, Eric W; Mattson, Nathan R; Wieben, Eric D; Wiepert, Mathieu; Wildman, Derek E; Mainzer, Liudmila S.

BMC Bioinformatics ; 20(1): 722, 2019 12 17.

Artigo em Inglês | MEDLINE | ID: mdl-31847808

RESUMO

Following publication of the original article [1], the author explained that Table 2 is displayed incorrectly. The correct Table 2 is given below. The original article has been corrected.

Recommendations for performance optimizations when using GATK3.8 and GATK4.

BMC Bioinformatics ; 20(1): 557, 2019 Nov 08.

Artigo em Inglês | MEDLINE | ID: mdl-31703611

RESUMO

BACKGROUND: Use of the Genome Analysis Toolkit (GATK) continues to be the standard practice in genomic variant calling in both research and the clinic. Recently the toolkit has been rapidly evolving. Significant computational performance improvements have been introduced in GATK3.8 through collaboration with Intel in 2017. The first release of GATK4 in early 2018 revealed rewrites in the code base, as the stepping stone toward a Spark implementation. As the software continues to be a moving target for optimal deployment in highly productive environments, we present a detailed analysis of these improvements, to help the community stay abreast with changes in performance. RESULTS: We re-evaluated multiple options, such as threading, parallel garbage collection, I/O options and data-level parallelization. Additionally, we considered the trade-offs of using GATK3.8 and GATK4. We found optimized parameter values that reduce the time of executing the best practices variant calling procedure by 29.3% for GATK3.8 and 16.9% for GATK4. Further speedups can be accomplished by splitting data for parallel analysis, resulting in run time of only a few hours on whole human genome sequenced to the depth of 20X, for both versions of GATK. Nonetheless, GATK4 is already much more cost-effective than GATK3.8. Thanks to significant rewrites of the algorithms, the same analysis can be run largely in a single-threaded fashion, allowing users to process multiple samples on the same CPU. CONCLUSIONS: In time-sensitive situations, when a patient has a critical or rapidly developing condition, it is useful to minimize the time to process a single sample. In such cases we recommend using GATK3.8 by splitting the sample into chunks and computing across multiple nodes. The resultant walltime will be nnn.4 hours at the cost of $41.60 on 4 c5.18xlarge instances of Amazon Cloud. For cost-effectiveness of routine analyses or for large population studies, it is useful to maximize the number of samples processed per unit time. Thus we recommend GATK4, running multiple samples on one node. The total walltime will be â¼34.1 hours on 40 samples, with 1.18 samples processed per hour at the cost of $2.60 per sample on c5.18xlarge instance of Amazon Cloud.

Assuntos

Genômica/métodos , Software , Algoritmos , Cromossomos Humanos/genética , Genoma Humano , Haplótipos/genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos

Sentieon DNASeq Variant Calling Workflow Demonstrates Strong Computational Performance and Accuracy.

Kendig, Katherine I; Baheti, Saurabh; Bockol, Matthew A; Drucker, Travis M; Hart, Steven N; Heldenbrand, Jacob R; Hernaez, Mikel; Hudson, Matthew E; Kalmbach, Michael T; Klee, Eric W; Mattson, Nathan R; Ross, Christian A; Taschuk, Morgan; Wieben, Eric D; Wiepert, Mathieu; Wildman, Derek E; Mainzer, Liudmila S.

Front Genet ; 10: 736, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-31481971

RESUMO

As reliable, efficient genome sequencing becomes ubiquitous, the need for similarly reliable and efficient variant calling becomes increasingly important. The Genome Analysis Toolkit (GATK), maintained by the Broad Institute, is currently the widely accepted standard for variant calling software. However, alternative solutions may provide faster variant calling without sacrificing accuracy. One such alternative is Sentieon DNASeq, a toolkit analogous to GATK but built on a highly optimized backend. We conducted an independent evaluation of the DNASeq single-sample variant calling pipeline in comparison to that of GATK. Our results support the near-identical accuracy of the two software packages, showcase optimal scalability and great speed from Sentieon, and describe computational performance considerations for the deployment of DNASeq.

Copy number variant analysis using genome-wide mate-pair sequencing.

Smadbeck, James B; Johnson, Sarah H; Smoley, Stephanie A; Gaitatzes, Athanasios; Drucker, Travis M; Zenka, Roman M; Kosari, Farhad; Murphy, Stephen J; Hoppman, Nicole; Aypar, Umut; Sukov, William R; Jenkins, Robert B; Kearney, Hutton M; Feldman, Andrew L; Vasmatzis, George.

Genes Chromosomes Cancer ; 57(9): 459-470, 2018 09.

Artigo em Inglês | MEDLINE | ID: mdl-29726617

RESUMO

Copy number variation (CNV) is a common form of structural variation detected in human genomes, occurring as both constitutional and somatic events. Cytogenetic techniques like chromosomal microarray (CMA) are widely used in analyzing CNVs. However, CMA techniques cannot resolve the full nature of these structural variations (i.e. the orientation and location of associated breakpoint junctions) and must be combined with other cytogenetic techniques, such as karyotyping or FISH, to do so. This makes the development of a next-generation sequencing (NGS) approach capable of resolving both CNVs and breakpoint junctions desirable. Mate-pair sequencing (MPseq) is a NGS technology designed to find large structural rearrangements across the entire genome. Here we present an algorithm capable of performing copy number analysis from mate-pair sequencing data. The algorithm uses a step-wise procedure involving normalization, segmentation, and classification of the sequencing data. The segmentation technique combines both read depth and discordant mate-pair reads to increase the sensitivity and resolution of CNV calls. The method is particularly suited to MPseq, which is designed to detect breakpoint junctions at high resolution. This allows for the classification step to accurately calculate copy number levels at the relatively low read depth of MPseq. Here we compare results for a series of hematological cancer samples that were tested with CMA and MPseq. We demonstrate comparable sensitivity to the state-of-the-art CMA technology, with the benefit of improved breakpoint resolution. The algorithm provides a powerful analytical tool for the analysis of MPseq results in cancer.

Assuntos

Aberrações Cromossômicas , Variações do Número de Cópias de DNA/genética , Genoma Humano/genética , Sequenciamento de Nucleotídeos em Larga Escala , Algoritmos , Pontos de Quebra do Cromossomo , Rearranjo Gênico , Humanos , Análise Serial de Tecidos/métodos

SVAtools for junction detection of genome-wide chromosomal rearrangements by mate-pair sequencing (MPseq).

Johnson, Sarah H; Smadbeck, James B; Smoley, Stephanie A; Gaitatzes, Athanasios; Murphy, Stephen J; Harris, Faye R; Drucker, Travis M; Zenka, Roman M; Pitel, Beth A; Rowsey, Ross A; Hoppman, Nicole L; Aypar, Umut; Sukov, William R; Jenkins, Robert B; Feldman, Andrew L; Kearney, Hutton M; Vasmatzis, George.

Cancer Genet ; 221: 1-18, 2018 02.

Artigo em Inglês | MEDLINE | ID: mdl-29405991

RESUMO

Mate-pair sequencing (MPseq), using long-insert, paired-end genomic libraries, is a powerful next-generation sequencing-based approach for the detection of genomic structural variants. SVAtools is a set of algorithms to detect both chromosomal rearrangements and large (>10 kb) copy number variants (CNVs) in genome-wide MPseq data. SVAtools can also predict gene disruptions and gene fusions, and characterize the genomic structure of complex rearrangements. To illustrate the power of SVAtools' junction detection methods to provide comprehensive molecular karyotypes, MPseq data were compared against a set of samples previously characterized by traditional cytogenetic methods. Karyotype, FISH and chromosomal microarray (CMA), performed for 29 patients in a clinical laboratory setting, collectively revealed 285 breakpoints in 87 rearrangements. The junction detection methods of SVAtools detected 87% of these breakpoints compared to 48%, 42% and 57% for karyotype, FISH and CMA respectively. Breakpoint resolution was also reported to 1 kb or less and additional genomic rearrangement complexities not appreciable by standard cytogenetic techniques were revealed. For example, 63% of CNVs detected by CMA were shown by SVAtools' junction detection to occur secondary to a rearrangement other than a simple deletion or tandem duplication. SVAtools with MPseq provides comprehensive and accurate whole-genome junction detection with improved breakpoint resolution, compared to karyotype, FISH, and CMA combined. This approach to molecular karyotyping offers considerable diagnostic potential for the simultaneous detection of both novel and recurrent genomic rearrangements in hereditary and neoplastic disorders.

Assuntos

Fusão Gênica/genética , Genoma Humano/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Aberrações Cromossômicas , Humanos

Comparison of Whole-Genome Sequencing Methods for Analysis of Three Methicillin-Resistant Staphylococcus aureus Outbreaks.

Cunningham, Scott A; Chia, Nicholas; Jeraldo, Patricio R; Quest, Daniel J; Johnson, Julie A; Boxrud, Dave J; Taylor, Angela J; Chen, Jun; Jenkins, Gregory D; Drucker, Travis M; Nelson, Heidi; Patel, Robin.

J Clin Microbiol ; 55(6): 1946-1953, 2017 06.

Artigo em Inglês | MEDLINE | ID: mdl-28404677

RESUMO

Whole-genome sequencing (WGS) can provide excellent resolution in global and local epidemiological investigations of Staphylococcus aureus outbreaks. A variety of sequencing approaches and analytical tools have been used; it is not clear which is ideal. We compared two WGS strategies and two analytical approaches to the standard method of SmaI restriction digestion pulsed-field gel electrophoresis (PFGE) for typing S. aureus Forty-two S. aureus isolates from three outbreaks and 12 reference isolates were studied. Near-complete genomes, assembled de novo with paired-end and long-mate-pair (8 kb) libraries were first assembled and analyzed utilizing an in-house assembly and analytical informatics pipeline. In addition, paired-end data were assembled and analyzed using a commercial software package. Single nucleotide variant (SNP) analysis was performed using the in-house pipeline. Two assembly strategies were used to generate core genome multilocus sequence typing (cgMLST) data. First, the near-complete genome data generated with the in-house pipeline were imported into the commercial software and used to perform cgMLST analysis. Second, the commercial software was used to assemble paired-end data, and resolved assemblies were used to perform cgMLST. Similar isolate clustering was observed using SNP calling and cgMLST, regardless of data assembly strategy. All methods provided more discrimination between outbreaks than did PFGE. Overall, all of the evaluated WGS strategies yielded statistically similar results for S. aureus typing.

Assuntos

Surtos de Doenças , Staphylococcus aureus Resistente à Meticilina/classificação , Staphylococcus aureus Resistente à Meticilina/genética , Epidemiologia Molecular/métodos , Tipagem Molecular/métodos , Infecções Estafilocócicas/epidemiologia , Sequenciamento Completo do Genoma/métodos , Análise por Conglomerados , Biologia Computacional/métodos , Humanos , Staphylococcus aureus Resistente à Meticilina/isolamento & purificação , Infecções Estafilocócicas/microbiologia

Identification of independent primary tumors and intrapulmonary metastases using DNA rearrangements in non-small-cell lung cancer.

Murphy, Stephen J; Aubry, Marie-Christine; Harris, Faye R; Halling, Geoffrey C; Johnson, Sarah H; Terra, Simone; Drucker, Travis M; Asiedu, Michael K; Kipp, Benjamin R; Yi, Eunhee S; Peikert, Tobias; Yang, Ping; Vasmatzis, George; Wigle, Dennis A.

J Clin Oncol ; 32(36): 4050-8, 2014 Dec 20.

Artigo em Inglês | MEDLINE | ID: mdl-25385739

RESUMO

PURPOSE: Distinguishing independent primary tumors from intrapulmonary metastases in non-small-cell carcinoma remains a clinical dilemma with significant clinical implications. Using next-generation DNA sequencing, we developed a chromosomal rearrangement-based approach to differentiate multiple primary tumors from metastasis. METHODS: Tumor specimens from patients with known independent primary tumors and metastatic lesions were used for lineage test development, which was then applied to multifocal tumors. Laser capture microdissection was performed separately for each tumor. Genomic DNA was isolated using direct in situ whole-genome amplification methodology, and next-generation sequencing was performed using an Illumina mate-pair library protocol. Sequence reads were mapped to the human genome, and primers spanning the fusion junctions were used for validation polymerase chain reaction. RESULTS: A total of 41 tumor samples were sequenced (33 adenocarcinomas [ADs] and eight squamous cell carcinomas [SQCCs]), with a range of three to 276 breakpoints per tumor identified. Lung tumors predicted to be independent primary tumors based on different histologic subtype did not share any genomic rearrangements. In patients with lung primary tumors and paired distant metastases, shared rearrangements were identified in all tumor pairs, emphasizing the patient specificity of identified breakpoints. Multifocal AD and SQCC samples were reviewed independently by two pulmonary pathologists. Concordance between histology and genomic data occurred in the majority of samples. Discrepant tumor samples were resolved by genome sequencing. CONCLUSION: A diagnostic lineage test based on genomic rearrangements from mate-pair sequencing demonstrates promise for distinguishing independent primary from metastatic disease in lung cancer.

Assuntos

Carcinoma Pulmonar de Células não Pequenas/diagnóstico , Carcinoma Pulmonar de Células não Pequenas/secundário , Rearranjo Gênico , Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/secundário , Análise de Sequência de DNA/métodos , Carcinoma Pulmonar de Células não Pequenas/genética , Carcinoma Pulmonar de Células não Pequenas/patologia , Dosagem de Genes , Humanos , Microdissecção e Captura a Laser , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/patologia

BIMA V3: an aligner customized for mate pair library sequencing.

Drucker, Travis M; Johnson, Sarah H; Murphy, Stephen J; Cradic, Kendall W; Therneau, Terry M; Vasmatzis, George.

Bioinformatics ; 30(11): 1627-9, 2014 Jun 01.

Artigo em Inglês | MEDLINE | ID: mdl-24526710

RESUMO

Mate pair library sequencing is an effective and economical method for detecting genomic structural variants and chromosomal abnormalities. Unfortunately, the mapping and alignment of mate-pair read pairs to a reference genome is a challenging and time-consuming process for most next-generation sequencing alignment programs. Large insert sizes, introduction of library preparation protocol artifacts (biotin junction reads, paired-end read contamination, chimeras, etc.) and presence of structural variant breakpoints within reads increase mapping and alignment complexity. We describe an algorithm that is up to 20 times faster and 25% more accurate than popular next-generation sequencing alignment programs when processing mate pair sequencing.

Assuntos

Algoritmos , Biblioteca Gênica , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Genoma , Variação Estrutural do Genoma , Sequenciamento de Nucleotídeos em Larga Escala , Software

A simple method for gene phasing using mate pair sequencing.

Cradic, Kendall W; Murphy, Stephen J; Drucker, Travis M; Sikkink, Robert A; Eberhardt, Norman L; Neuhauser, Claudia; Vasmatzis, George; Grebe, Stefan K G.

BMC Med Genet ; 15: 19, 2014 Feb 06.

Artigo em Inglês | MEDLINE | ID: mdl-24502676

RESUMO

BACKGROUND: Recessive genes cause disease when both copies are affected by mutant loci. Resolving the cis/trans relationship of variations has been an important problem both for researchers, and increasingly, clinicians. Of particular concern are patients who have two heterozygous disease-causing mutations and could be diagnosed as affected (one mutation on each allele) or as phenotypically normal (both mutations on the same allele). Several methods are currently used to phase genes, however due to cost, complexity and/or low sensitivity they are not suitable for clinical purposes. METHODS: Long-range amplification was used to select and enrich the target gene (CYP21A2) followed by modified mate-pair sequencing. Fragments that mapped coincidently to two heterozygous sites were identified and used for statistical analysis. RESULTS: Probabilities for cis/trans relationships between heterozygous positions were calculated along with 99% confidence intervals over the entire length of our 10 kb amplicons. The quality of phasing was closely related to the depth of coverage and the number of erroneous reads. Most of the error was found to have been introduced by recombination in the PCR reaction. CONCLUSIONS: We have developed a simple method utilizing massively parallel sequencing that is capable of resolving two alleles containing multiple heterozygous positions. This method stands out among other phasing tools because it provides quantitative results allowing confident haplotype calls.

Assuntos

Haplótipos/genética , Análise de Sequência/métodos , Heterozigoto , Reação em Cadeia da Polimerase , Probabilidade , Projetos de Pesquisa , Esteroide 21-Hidroxilase/genética

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA